Extended many-item similarity indices for sets of nucleotide and protein sequences

نویسندگان

چکیده

Quantification of similarities between protein sequences or DNA/RNA strands is a (sub-)task that ubiquitously present in bioinformatics workflows, and usually accomplished by pairwise comparisons sequences, utilizing simple ( e.g. percent identity) more intricate concepts substitution scoring matrices). Complex tasks (such as clustering) rely on large number under the hood, instead direct quantification set similarities. Based our recently introduced framework enables multiple binary molecular fingerprints i.e. , calculation similarity fingerprint sets), here we introduce novel symmetric indices for analogous calculations sets character with than two t ) possible items = 4, 20). The features these new are studied detail analysis variance (ANOVA), demonstrated three case studies protein/DNA varying degrees (or evolutionary proximity). Python code extended many-item publicly available at: https://github.com/ramirandaq/tn_Comparisons .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New distance and similarity measures for hesitant fuzzy soft sets

The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...

متن کامل

SOME SIMILARITY MEASURES FOR PICTURE FUZZY SETS AND THEIR APPLICATIONS

In this work, we shall present some novel process to measure the similarity between picture fuzzy sets. Firstly, we adopt the concept of intuitionistic fuzzy sets, interval-valued intuitionistic fuzzy sets and picture fuzzy sets. Secondly, we develop some similarity measures between picture fuzzy sets, such as, cosine similarity measure, weighted cosine similarity measure, set-theoretic similar...

متن کامل

Similarity of Event Sequences Extended Abstract

Sequences of events are an important form of data that occurs in many application domains, such as telecommunications , biostatistics, user interface design, etc. We present a simple model for measuring the similarity of event sequences, and show that the resulting measure of distance can be eeciently computed using a form of dynamic programming.

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

Clustering Item Data Sets with Association-Taxonomy Similarity

We explore in this paper the efficient clustering of item data. Different from those of the traditional data, the features of item data are known to be of high dimensionality and sparsity. In view of the features of item data, we devise in this paper a novel measurement, called the associationtaxonomy similarity, and utilize this measurement to perform the clustering. With this association-taxo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computational and structural biotechnology journal

سال: 2021

ISSN: ['2001-0370']

DOI: https://doi.org/10.1016/j.csbj.2021.06.021